Development and test–retest reliability of a screening tool for axial spondyloarthritis

Background People with axial Spondyloarthritis (axSpA) suffer from lengthy diagnostic delays of ~7 years. The usage of screening tools to identify axSpA patients in primary care can reduce diagnostic delays by facilitating early referral to rheumatologic care. The purpose of this study was to examine the psychometric properties of a potential screening tool for patients with axSpA. Method Content validity was evaluated by soliciting feedback from 7 rheumatologists regarding the relevance and content representativeness of the proposed screening questions. For the test-retest study, participants ≥18 years of age with chronic back pain (≥3 months) without a diagnosis of mechanical or inflammatory back pain (n = 91) were e-recruited through ResearchMatch. Participation included completing identical baseline and follow-up questionnaires ~14 days apart. Weighted quadratic kappa was used to measure test-retest reliability between the two ratings of the ordinal scales. Construct validity was examined using exploratory factor analysis (EFA) and items with factor loadings ≥0.6 were extracted. Scale dimensionality and simplified factorial solutions were measured using Kaiser’s criteria (Eigenvalue >1). Cronbach’s alpha was used to measure internal consistency. Results Most participants were women, non-Hispanic white, and had at least some college education, with a mean age of 45 years. On average, the age at onset of back pain was 31 years. Eleven questions yielded test–retest reliabilities ranging from 0.6 to 0.76. Results from EFA extracted two factors relating to: 1) how pain affects daily life functioning and 2) whether pain improves with movement. Internal consistency was high for questions evaluating how pain affects life, with a Cronbach’s alpha of 0.81. Following assessment for validity and reliability, the questionnaire was revised to create the 6-item screening tool. Conclusions The 6-item SpA-SED screening tool designed to identify potential cases of axSpA was found to have good test–retest reliability and high internal consistency.


Introduction
Axial spondyloarthritis (axSpA), is characterized by chronic back pain and stiffness, limited axial skeletal mobility, and fatigue [1,2]. The average delay in diagnosing an individual with axSpA ranges from 7 to 10 years [3] with estimates as high as 13 years between symptom onset and diagnosis in the United States [4]. Reasons for delay in disease diagnosis include: 1) common and vague initial symptoms (e.g., back pain) that are non-specific and could be ascribed to other conditions [5,6], and 2) appearance of radiological changes later in the disease course of axSpA [7,8]. Delayed diagnosis deprives patients of the potential benefit of early treatment in slowing disease progression to avoid or delay serious disability [9,10].
Delays in diagnosis have also been attributed to late referral of patients by general practitioners to rheumatologists, since non-rheumatologist physicians in the US are less aware of axSpA [11,12]. Studies show that primary care physicians have difficulty differentiating inflammatory back pain (IBP) from the more common mechanical back pain [13,14] or are unaware of other features of spondyloarthritis that are important for differential diagnosis [12,14]. We previously conducted qualitative research with primary care providers in which they agreed that improvements in screening and early detection of axSpA are needed [15]. The Assessment of SpondyloArthritis International Society (ASAS) Inflammatory Back Pain Assessment: ASAS Expert Criteria screening questions have been used in primary care settings to screen for axSpA [15,16]. However, primary care providers considered that some of these questions were neither sensitive nor specific and needed improvement [15,16].
Previously, we derived potential screening questions from qualitative interviews that we conducted with patients who had chronic back pain and refined the wording of these questions to improve their clarity and ease of administration as screening questions to be implemented in primary care settings [17]. The purpose of this study was to evaluate the psychometric properties of these proposed screening questions to develop a prognostic screening tool for axSpA for implementation in primary care settings.

Methods
This study protocol of the SpondyloArthritis Screening and Early Detection (SpA-SED) Study-Test-retest study was approved by the UMass Chan Medical School Institutional Review Board (IRB number: H00020620).

Study design
The study was conducted in two phases. First, a test-retest study was conducted among patients with chronic back pain, but who did not have a clinical diagnosis of either mechanical or inflammatory back pain. Participants were asked to complete identical baseline and followup questionnaires approximately 14 days apart. These questionnaires included 19 potential reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under award number UL1TR000161 (KLL). This work was also supported by a charitable contribution to the UMass Memorial Foundation from Timothy S. and Elaine L. Peterson (JK, SHL). This project was also supported by the SAA/Jane Bruckel Early Career Investigator in AxSpA Award (SHL). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The funders provided support in the form of salaries for authors [KLL, JK, SHL, EY], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
items/questions for inclusion in the screening tool (S1 and S2 Files). Questionnaires were administered online via REDCap [18], with the link to the follow-up questionnaire being sent approximately 14 days after the baseline questionnaire had been completed.
Second, a content validity study was conducted among members of a panel of rheumatologists with expertise in axSpA using REDCap to solicit feedback about the content validity of the proposed screening questions. Each rheumatologist was asked to specify his or her level of agreement with the relevance and content representativeness of the proposed items using a 5-point Likert scale (1 = disagree to 5 = strongly agree) (S3 File) [19]. The knowledge gained complemented and helped to interpret the data collected from patients in the test-retest study.

Participant recruitment
Two groups of participants were recruited for the study: 1) patients with chronic back pain (�3 months); and 2) rheumatologists with expertise in axSpA.
Patients. To be eligible for participation in on-line REDCap surveys, participants were required to: 1) have had back pain for at least 3 months (by self-report), but not have been given a clinical diagnosis of mechanical or inflammatory back pain; 2) have registered in ResearchMatch [20]; and 3) be at least 18 years of age. Patients were e-recruited through ResearchMatch, which is a disease-neutral, web-based recruitment registry to help match individuals who wish to participate in clinical research studies with researchers actively searching for volunteers throughout the United States [20].
Standard ResearchMatch procedures were followed to invite subjects to participate in our study. After receiving IRB approval, the study was registered on ResearchMatch and study details were entered. After IRB approval was verified by ResearchMatch, the research team was granted permission to access the database of potential study subjects. Individuals in the database who satisfied our eligibility criteria were identified. ResearchMatch then sent an IRBapproved recruitment message to these volunteers to inform them of the opportunity to participate in this study. After individuals had responded, giving permission to be contacted for our study, personal contact information was made available to the research team within the secure ResearchMatch system and these potential study participants were contacted on Research-Match. Patients who completed both the baseline and follow-up questionnaires were compensated for their participation with a $50 cash card. Fig 1 displays the patient recruitment flow chart. We contacted 2,845 potentially eligible participants on ResearchMatch. Of those, 215 individuals agreed to participate, and 174 completed the baseline questionnaire. Of these 174 subjects, 40 did not meet eligibility because of the absence of chronic back pain. Among the eligible 134 participants who completed the baseline questionnaire, all were sent the follow-up questionnaire and 93 completed the follow-up questionnaire. Ultimately, 91 participants responded to both the baseline and follow-up questionnaires with complete data.
Rheumatologists. Seven rheumatologists were identified from among members of the Spondyloarthritis Research and Treatment Network (SPARTAN) who were known by the investigators to have expertise in axSpA and were invited to participate in the study. Written informed consent was obtained. Those rheumatologists participating in the research study were compensated for their time and effort with a $300 cash card.

Questionnaires
Each participant with chronic back pain completed an eligibility questionnaire at the beginning of the baseline survey (S1 File). The eligibility questionnaire confirmed whether the participant had suffered from back pain for �3 months and whether the back pain resulted from a specific incident. A background questionnaire, which collected information on patient demographics including sex/gender, age, race/ethnicity, and education level, was completed by eligible participants at the end of the baseline survey. Subjects were also asked if they would be willing to be re-contacted for future research.

Statistical analysis
For patient participants, descriptive statistics including means and standard deviations (SD) for continuous variables and percentages for categorical variables were calculated to describe patient characteristics. Psychometric properties of the proposed screening tool were examined including test-retest reliability, exploratory factor analysis, and Cronbach's alpha. Test-retest reliability was calculated by computing the percent agreement and k statistics between two administrations of the questionnaire to the same subject.
Overall, the analytic approach was an iterative process. First, weighted kappa was used to measure the test-retest reliability (i.e., agreement) between the two ratings of the ordinal scales [21,22]. With ordinal scales, weighted kappa coefficients are used where disagreements are weighted by the degree of discrepancy [21,23]. Quadratic weighted kappa is often recommended since it is equivalent to the product-moment correlation and the intraclass correlation coefficient under certain scenarios [22,24]. This metric typically varies from 0 (random agreement) to 1 (complete agreement). The kappa value of 0.6 was considered moderate test-retest reliability and was used as a cut-off to determine the selection of the proposed questions into the screening tool [25]. For questions with binary responses, the reliability was assessed using the Fleiss kappa coefficient [26,27].
Exploratory factor analysis (EFA) was then conducted to examine the dimensionality of the screening tool. Questions that did not have sufficient factor loading (e.g., a corrected item total correlation (CITC) > 0.3) was indicative of good fit and were included in the analysis [28]. Considering that our primary goal was to distill down the number of items needed to capture the underlying structure, only items with factor loadings � 0.6 were extracted (factor loadings � 0.5 suggest practical significance) [29]. The criterion used to produce scale dimensionality and simplified factorial solutions included Kaiser's criteria (eigenvalue > 1 rule) [30] and the Scree plot [31].
Lastly, Cronbach's alpha was used to measure internal consistency (i.e., scale reliability) of the screening tool, and to evaluate how closely the proposed screening questions were related [32]. Cronbach's alpha is computed by correlating the score for each scale item with the total score for each observation (usually individual survey respondents or test takers), and then comparing that to the variance for all individual item scores. The resulting α coefficient of reliability ranges from 0 to 1 in providing this overall assessment of a measure's reliability. A higher α coefficient indicates that more items have shared covariance and measure the same underlying concept [33].
For the questionnaire used to assess content validity by rheumatologists, the overall percent agreement for these items were evaluated and summarized. The rheumatologists were asked to rank each question based on its significance and relevance to axSpA symptoms using a 4-point Likert scale. Options ranged from being not important to very important. Agreement on each question was based on the option that received the highest level of agreement (majority of rheumatologists agreeing that the question is somewhat important) with incorporation of feedback from rheumatologists. Table 1 shows the demographic characteristic of patient participants. Overall, the majority of participants were women (72.5%), and non-Hispanic white (79%), with a mean age of 45 years. On average, the age at onset of back pain was 30.7 years. Although 31% of all subjects had completed college, 42% of men but only 29% of women had some college or a technical school degree.

Test-retest reliability
The test-retest reliability for each proposed question is shown in Table 2. Eleven of the 19 items on the screener had relatively moderate correlations (quadratic kappa � 0.6) across the two-week, test-retest period (Table 3). Those 11 questions yielded test-retest reliabilities with quadratic weighted kappa's ranging from 0.60 to 0.76. Eight proposed questions with a quadratic kappa less than 0.60 were excluded from the screener (Table 4).

Exploratory factor analysis
Two items (age at initial onset of back pain, and presence of chronic back pain [Q1]) incorporated into one question on the screener were identified as potential screening questions in the baseline study and thus were excluded from the analysis. Preliminary analysis revealed that question 19 did not have sufficient factor loading (CITC < 0.3) and thus it was eliminated.

PLOS ONE
Exploratory factor analysis was subsequently performed on the remaining nine questions using a principal component factor analysis with a varimax rotation. Results from this analysis ( Table 5) generated a two-factor model to explain the latent variable. Additionally, the scree plot showed two factors above the elbow (Eigenvalues > 1), confirming that a two-factor solution is the best approach (Fig 2). Items that loaded highly on factor one included measures related to the impact of pain on daily life (Q3, Q9, Q11, Q13, and Q14); these were designated as "daily life functioning". Items relating to the effect of pain on movement (Q5 and Q7) loaded on factor two and were designated as "pain better with movement".

Cronbach's alpha
The instrument (including questions extracted from the exploratory factor analysis) was found to have satisfactory internal consistency with a Cronbach's alpha of 0.81. The Cronbach's alpha coefficient remained unchanged following the exclusion of two items (Q4 and Q15). Because two items (Q5 and Q7) that loaded on factor 2 (pain better with movement) were found to

PLOS ONE
correlate highly with one another, we included only one of these two variables in the screener (Q5). Table 6 lists the six questions that were included in the final proposed screening tool.

Content validity
Most of the 21 questions that were evaluated for content validity were considered by rheumatologists to be between "somewhat important" and "very important." Three questions garnered mixed responses (Q8: Over the last 3 months, how often did your back pain get better with movement after waking? Q11: Over the last 3 months, how often did your back pain make it uncomfortable to sit for more than 2 hours? and Q21: Has anyone in your family had an auto immune disease?) and two questions were considered to be "insignificant/unimportant" (Q2: Over the last 3 months, how often did your back feel the same or worse? and Q10: Over the last 3 months, how often are you aware of your back pain when sitting for two hours or more?). The final six-item questionnaire ( Table 6) was developed based upon feedback from both patients and the panel of rheumatologists to ensure that it incorporated the perspective of both groups.

Discussion
Findings from the test-retest study indicate that our proposed screener is fairly stable. The test-retest reliabilities were acceptable, with eleven items having quadratic weighted Kappa values of �0.6, six items with values �0.5 and <0.6, and only 2 items with values <0.5. The exploratory factor analysis extracted two factors related to: 1) impact of pain on life and 2) impact of movement on pain. The internal consistency for questions evaluating the impact of pain on life was high, with a Cronbach's alpha of 0.81. Six questions were identified for incorporation into a screening tool based on a priori psychometric properties. These final six questions have good test-retest reliability and high internal consistency. Lower back pain is a common symptom and a frequent reason for seeking medical care in primary care settings in the US [34,35]. Differentiating common, relatively minor low back pain from the less common IBP which leads to an axSpA diagnosis is challenging [9,35]. Several strategies have been used previously to identify patients with axSpA [36][37][38][39][40][41]. These include testing for a single screening parameter (e.g., IBP) or a combination of parameters (e.g., HLA-B27 positivity, sacroiliitis on imaging, response to non-steroidal anti-inflammatory drugs, positive family history for SpA), performing whole body MRI, or applying ASAS classification criteria for axSpA [36][37][38]40,41]. However, questions related to a single parameter (e.g., IBP) showed low to no diagnostic value in primary care settings [37] and axSpA classification criteria were not developed for use as a screener [36,42]. In addition, Weisman et al, developed and validated a screening tool which combined laboratory test results and patient questionnaires to differentiate potential cases of AS from mechanical back pain [42]. However, the generalizability of this tool is a potential concern, since all of the patients included in their Table 6. List of questions included in the final proposed screening tool.
Q3. Over the last 3 months, how often did your back feel stiff during the first two hours after waking?
Q5. Over the last 3 months, how often did your back pain get better with movement?
Q9. Over the last 3 months, how often did your back pain make it uncomfortable to sit for more than 2 hours? Q11. Over the last 3 months, how often do you avoid social activities because of your back pain?
Q13. Over the last 3 months, how often did your back pain make it difficult to sleep?
Q14. Over the last 3 months, how often did your back pain wake you up from sleep? https://doi.org/10.1371/journal.pone.0269494.t006 validation study had an established diagnosis of AS. Furthermore, the questionnaire may not include language reflective of the patient experiences.
Our prognostic screening tool improves upon these approaches. The proposed screening tool was informed by previous qualitative research and includes questions designed to test for a variety of axSpA symptoms [17]. Overall, we tested 19 items to capture multiple domains related to self-reported symptoms of axSpA, including IBP, stiffness, pain with movement, sleep, use of pain medications, other joint pain, and family history. For each domain, multiple questions were included that captured essentially the same underlying construct. Through a process of multiple iterations, only those items which scored well under domains that were most relevant to axSpA, according to published literature and our previous work [17,[43][44][45], were included to develop a patient-friendly screener. For example, from the potential screening questions, three questions (Q5, Q6, Q7) related to movement were examined. The testretest reliabilities for two (Q5 and Q7) of the three questions pertaining to movement had acceptable weighted kappa values (>0.6) and factor loadings (0.9), which ensured that 'pain with movement' was an important construct in patients with axSpA. Question 12 had low testretest reliability (0.46), with both physicians and patients pointing out the ambiguity of the term "rest" and therefore was excluded. In addition, the family history question experienced mixed agreement from rheumatologists and in our previous study with cognitive interviews with mechanical back pain and diagnosed axSpA patients believed the question needed specific examples [17].
The primary goal of this study was to develop a patient-friendly screener that reflects patient experiences. Considering the complexity of the US healthcare system and the challenges associated with implementation of a screening tool in primary care settings [15,46], evidence to ensure its usefulness in routine primary care practice will be important. In our previous work, we demonstrated that the proposed screener possesses the necessary properties to serve as an effective screening tool: it is brief, understandable at a grade 8 reading level, and can be self-administered [17]. Further, it should help clinicians to decide which of the screened patients will benefit from prompt rheumatology referral for evaluation and treatment.

Strengths and limitations
A notable strength of this study is that the study sample included individuals with any form of chronic back pain, all of whom were recruited from ResearchMatch. Thus, our results are more generalizable than those of previous studies that included only individuals with IBP. In addition, the proposed screening questions were informed by our previous work in which we explored issues related to patients' diagnostic journeys using evidence-based qualitative research methodology. This helped us to understand and incorporate patient and physician perspectives [15,17,45]. Our study also has several limitations. Presently, the screening tool is available only in the English language. Considering that participation was voluntary and responses to the surveys were self-reported, there may be self-selection bias and recall bias. In addition, since the presence of chronic back pain was self-reported, the lack of physician confirmation of the diagnosis may have introduced further potential bias on the study. Since axSpA is known to manifest differently in men and women, a sex-stratified analysis would have been ideal and informative. However, our study was not sufficiently powered to perform such an analysis. Further research to refine and validate the screening tool in multiple clinical settings, including family medicine, rheumatology, orthopedic, and spine clinics, is warranted.

Conclusions
Diagnostic delays in axSpA are significant as patients often suffer with symptoms for years before being diagnosed, thereby contributing to the economic burden of this disease to patients, their caregivers, and the healthcare system. Early recognition and awareness of axSpA symptoms in the primary care setting, primarily those of IBP, is paramount to reduce disease burden and improve quality of life in these patients. To be feasible for its implementation in routine clinical care, a screening tool must be brief and capture distinguishing features of axSpA. Further areas of exploration include the inclusion of gender and ethnic demographics. This study provides core information from which several versions of a screening tool for axSpA may be developed for validation in future studies. An evidence-based approach will be critical to demonstrate the validity and effectiveness of such a screening tool to support its widespread implementation in primary care.